2 resultados para SEQUENCES

em Cochin University of Science


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The classical methods of analysing time series by Box-Jenkins approach assume that the observed series uctuates around changing levels with constant variance. That is, the time series is assumed to be of homoscedastic nature. However, the nancial time series exhibits the presence of heteroscedasticity in the sense that, it possesses non-constant conditional variance given the past observations. So, the analysis of nancial time series, requires the modelling of such variances, which may depend on some time dependent factors or its own past values. This lead to introduction of several classes of models to study the behaviour of nancial time series. See Taylor (1986), Tsay (2005), Rachev et al. (2007). The class of models, used to describe the evolution of conditional variances is referred to as stochastic volatility modelsThe stochastic models available to analyse the conditional variances, are based on either normal or log-normal distributions. One of the objectives of the present study is to explore the possibility of employing some non-Gaussian distributions to model the volatility sequences and then study the behaviour of the resulting return series. This lead us to work on the related problem of statistical inference, which is the main contribution of the thesis